clean ratio
Data Stream Sampling with Fuzzy Task Boundaries and Noisy Labels
In the realm of continual learning, the presence of noisy labels within data streams represents a notable obstacle to model reliability and fairness. We focus on the data stream scenario outlined in pertinent literature, characterized by fuzzy task boundaries and noisy labels. To address this challenge, we introduce a novel and intuitive sampling method called Noisy Test Debiasing (NTD) to mitigate noisy labels in evolving data streams and establish a fair and robust continual learning algorithm. NTD is straightforward to implement, making it feasible across various scenarios. Our experiments benchmark four datasets, including two synthetic noise datasets (CIFAR10 and CIFAR100) and real-world noise datasets (mini-WebVision and Food-101N). The results validate the efficacy of NTD for online continual learning in scenarios with noisy labels in data streams. Compared to the previous leading approach, NTD achieves a training speedup enhancement over two times while maintaining or surpassing accuracy levels. Moreover, NTD utilizes less than one-fifth of the GPU memory resources compared to previous leading methods.
A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels
Yang, Yifan, Koppel, Alec, Zhang, Zheng
Learning with noisy labels is an important topic for scalable training in many real-world scenarios. However, few previous research considers this problem in the online setting, where the arrival of data is streaming. In this paper, we propose a novel gradient-based approach to enable the detection of noisy labels for the online learning of model parameters, named Online Gradient-based Robust Selection (OGRS). In contrast to the previous sample selection approach for the offline training that requires the estimation of a clean ratio of the dataset before each epoch of training, OGRS can automatically select clean samples by steps of gradient update from datasets with varying clean ratios without changing the parameter setting. During the training process, the OGRS method selects clean samples at each iteration and feeds the selected sample to incrementally update the model parameters. We provide a detailed theoretical analysis to demonstrate data selection process is converging to the low-loss region of the sample space, by introducing and proving the sub-linear local Lagrangian regret of the non-convex constrained optimization problem. Experimental results show that it outperforms state-of-the-art methods in different settings.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Arizona (0.04)